Unsupervised training methods for discriminative language modeling

نویسندگان

  • Erinç Dikici
  • Murat Saraclar
چکیده

Discriminative language modeling (DLM) aims to choose the most accurate word sequence by reranking the alternatives output by the automatic speech recognizer (ASR). The conventional (supervised) way of training a DLM requires a large amount of acoustic recordings together with their manual reference transcriptions. These transcriptions are used to determine the target ranks of the ASR outputs, but may be hard to obtain. Previous studies make use of the existing transcribed data to build a confusion model which boosts the training set by generating artificial data: a process known as semi-supervised training. In this study we concentrate on the unsupervised setting where no manual transcriptions are available at all. We propose three ways to determine a sequence that could serve as the missing reference text and two approaches which use this information to (i) determine the ranks of the ASR outputs in order to train the discriminative model directly, and (ii) build a confusion model in order to generate artificial training examples. We compare our techniques with the supervised and the semi-supervised setups. Using the reranking variant of the WER-sensitive perceptron algorithm, we obtain word error rate improvements up to half of those of the supervised case.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phrasal Cohort Based Unsupervised Discriminative Language Modeling

Simulated confusions enable the use of large text-only corpora for discriminative language modeling by hallucinating the likely recognition outputs that each (correct) sentence would be confused with. In [1], a novel approach was introduced to simulate confusions using phrasal cohorts derived directly from recognition output. However, the described approach relied on transcribed speech to deriv...

متن کامل

Lightly supervised training for risk-based discriminative language models

We propose a lightly supervised training method for a discriminative language model (DLM) based on risk minimization criteria. In lightly supervised training, pseudo labels generated by automatic speech recognition (ASR) are used as references. However, as these labels usually include recognition errors, the discriminative models estimated from such faulty reference labels may degrade ASR perfo...

متن کامل

Unsupervised discriminative language modeling using error rate estimator

Discriminative language modeling is a successful approach to improving speech recognition accuracy. However, it requires a large amount of spoken data and manually transcribed reference text for training. This paper proposes an unsupervised training method to overcome this handicap. The key idea is to use an error rate estimator, instead of calculating the true error rate from the reference. In...

متن کامل

Introspective Generative Modeling: Decide Discriminatively

We study unsupervised learning by developing introspective generative modeling (IGM) that attains a generator using progressively learned deep convolutional neural networks. The generator is itself a discriminator, capable of introspection: being able to self-evaluate the difference between its generated samples and the given training data. When followed by repeated discriminative learning, des...

متن کامل

Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation

Discriminative training for machine translation has been well studied in the recent past. A limitation of the work to date is that it relies on the availability of high-quality in-domain bilingual text for supervised training. We present an unsupervised discriminative training framework to incorporate the usually plentiful target-language monolingual data by using a rough “reverse” translation ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014